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AB The ability to use proteins in nonnatural environments greatly expands 
their potential applications in biotechnology. Because nature has not 
paid much attention to optimizing proteins for in vitro applications under 
conditions that differ substantially from their natural surroundings, 
there is generally room for improvement through alterations in the amino 
acid sequence. The most effective approach to this protein engineering 
task depends on the level to which the molecular basis for the desired 
property is understood. Consistently successful "rational" design using 
site -directed mutagenesis requires a high level of understanding of 
structure and mechanisms or, alternatively, a particularly simple strategy 
for obtaining the desired feature. An example of a generally applicable 
and easy-to-implement protein stabilization strategy is metal ion 
chelation by specific surface dihistidine sites, which can affect thermal 
stability as well as the protein's ability to withstand denaturants such 
as guanidinium chloride. Random mutagenesis, on the other hand, can be 
effective even when structure or mechanisms are poorly understood, 
provided one can conveniently screen or select for the property of 
interest. This approach is illustrated by the sequential accumulation of 
random mutations that greatly enhance the catalytic activity of 
a serine protease, subtilisin E, in polar organic 

solvents. The random mutagenesis approach, which mimics the natural 
evolutionary refinement process, can be used to "coax" enzymes into 
tolerating nonnatural environments. 
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TI [Thermitase, a thermostable serine protease of 

Thermoactinomyces vulgaris: interaction of the active center and the 
SH-group of the enzyme] . 

Thermitase, eine thermostabile Serin-Protease aus Thermo- actinomyces 
vulgaris: Wechselwirkung zwischen aktivem Zentrum und SH-Gruppe des 
Enzyms . 

AU Hansen G; Frommel C; Hausdorf G; Bauer S 

SO Acta biologica et medica Germanica, (1982) 41 (2-3) 137-44. 

Journal code: 0370276. ISSN: 0001-5318. 
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DT Journal; Article; (JOURNAL ARTICLE) 
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FS Priority Journals 
EM 198210 

ED Entered STN: 19900317 

Last Updated on STN: 20000303 

Entered Medline: 19821021 
AB Modification of the serine and histidine residue in the active centre of 

thermitase with diisopropylf luorophosphate (DFP) or L-l-tosylamide-2 - 

phenylethyl chloromethylketon (TPCK) , and of the only SH-group of the 

enzyme, with Hg-compounds causes an activity loss against hydrolysis of 

4 -nitrophenylacetate . While the modification of 

cysteine prevents reaction of serine and histidine in 

the active centre of the enzyme with DFP and TPCK, respectively, the Hg2+- 
and CF3Hg+-binding to the SH-group after modification of essential amino 
acid residues in the active centre is retained. To elucidate the 
interaction of the SH-group with the active centre, the modified products 
of thermitase were investigated for their thermostability. Ca2+-ions were 
found to have a stabilizing effect on all the modified products of 
thermitase, as well as on the native enzyme. Simultaneous 
modification of the cysteine and serine leads 

to an increase in thermostability of thermitase, whilst double 
modification at the cysteine and histidine causes destabilization 
of the enzyme . 
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RESULT 1 
US-09-027-337-2 

; Sequence 2, Application US/09027337B 
; Patent No. 5972616 
; GENERAL INFORMATION: 

APPLICANT: O'Brien, Timothy J. 
; APPLICANT: Tanimoto, Hirotoshi 

TITLE OF INVENTION: TADG-15: An Extracellular Serine Protease Overexpressed 

in 

TITLE OF INVENTION: Breast and Ovarian Carcinomas 
; FILE REFERENCE: D6064 

; CURRENT APPLICATION NUMBER: US/09/027 , 337B 
; CURRENT FILING DATE: 1998-02-20 
; NUMBER OF SEQ ID NOS : 13 
; SEQ ID NO 2 

LENGTH: 855 

TYPE : PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: Amino acid sequence of TADG-15 encoded by nucleotides 
; OTHER INFORMATION: 2 3 to 258 9 of Sequence 1 
; Patent No. 5972616 
US-09-027-337-2 

Query Match 100.0%; Score 4681; DB 2; Length 855; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 855; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
Qy 1 MGSDRARKGGGGPKDFGAGLKYNSRHEKWGLEEGVEFLPVNWKKVEKHGPGRWVVLAA 60 

1 1 1 1 i 1 1 i 1 1 1 1 1 ! 1 1 1 1 1 i 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! I 

Db 1 MGSDRARKGGGGPKDFGAGLKYNSRHEKVTSTGLEEGVEFLPWNVKKVEKHGPGRWW 60 

Qy 61 VL IGLLLVLLG IGFL VWHLQ YRDVRVQKVFNG YMR I TNENF VD AYENSNSTE F VSLAS KV 12 0 

Db 61 VL IGLLLVLLG I GFL VWHLQ YRDVRVQKVFNG YMRI TNENF VD AYENSNSTE FVSLAS KV 12 0 

Qy 121 KDALKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERVTVIAEERVVM 180 



Db 121 KDALKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERVMAEERVVM 180 

Qy 181 LPPRARSLKS FWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 240 

MM II IIIIIIMIIIIIIIIMIIMIMIIIII IIIIIIIIIMI MIMI 

Db 181 LPPRARSLKS F WTS WAFPTDS KTVQRTQDNSCS FGLHARGVELMRFTTPGFPDS PYPA 240 

Qy 241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 300 

MINIMI Mill IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMII 

Db 241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 300 

Qy 3 01 YNLTFHSSQNVLLITLITNTERRHPGFEATFFQLPRMSSCGGRLRKAQGTFNSPYYPGHY 3 60 

Mill Mill MINI IMIIIIIIIIMIIMIIII IMIIMIIIIIIIMIIIIIII 

Db 3 01 YNLTFHSSQNVLLITLITNTERRHPGFEATFFQLPRMSSCGGRLRKAQGTFNSPYYPGHY 3 60 

Qy 3 61 PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 420 

I 1 1 ! 1 1 . ! i III II 1 1 M 1 1 Mill I 1 1 ; 1 1 1 ! 1 1 ! I I Mill! 1 1 ; 1 1 1 1 1 

Db 361 PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 420 

Qy 421 NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 4 80 



Db 421 NSNKITWFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 4 80 

Qy 481 SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 540 

Db 4 81 SDELNC^ 54 0 

Qy 541 SQQCNGKDDCGDGSDEASCPKVNWTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 

IIMMI MINI illl IMUIMil IIIIIIMIMIIIIIII I MINI 

Db 541 SQQCNGKDDCGDGSDEASCPKVNWTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 

Qy 601 DCDCGLRS FT;RQAR WGGTD ADEGE WP WQ VS LHALGQGH ]03AS LIS PNWLVS AAI@^ I D 660 

Db 601 DCriCGLRSFTRQ 660 

Qy 661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP- 720 

Db 661 DRGFRYSOTTQWTAFLGL 720 

Qy 721 AEYSSMWl^^PDASHVFPAGKAIWV^^ 7 80 

IIIIIIIIIIIIMIIIIIIIMMIIIIIIMIMIIIIIIIIMMIIIIIimill 

Db 721 AEYSSMVRPICLPDASHVFPAGKAIWVTGWGHTQYGGTGALILQKGEIRVINQTT^NLL 780 

Qy 781 PQQITPRMI^GFLSGGVDS^^ 840 

Db 781 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVYT 84 0 

Qy 841 RLPLFRDWIKENTGV 855 

.IIIIIMI III 
Db 841 RLPLFRDWIKENTGV 855 



RESULT 2 
US-09-644-600-2 

; Sequence 2, Application US/09644600 
; Patent No. 6451500 
; GENERAL INFORMATION: 
; APPLICANT: O'Brien, Timothy J. 
APPLICANT: Tanimoto, Hirotoshi 

TITLE OF INVENTION: TADG-15: An Extracellular Serine Protease 
; TITLE OF INVENTION: Overexpressed in Carcinomas 
; FILE REFERENCE: D6064CIP/D 

; CURRENT APPLICATION NUMBER: US/0 9/644,60 0 
; CURRENT FILING DATE: 2000-08-23 

PRIOR APPLICATION NUMBER: 09/421,213 
; PRIOR FILING DATE: 1999-10-20 
; PRIOR APPLICATION NUMBER: 09/027,337 
; PRIOR FILING DATE: 1998-02-2 0 
; NUMBER OF SEQ ID NOS : 98 
; SEQ ID NO 2 

LENGTH: 855 

TYPE: PRT 

ORGANISM: Homo sapiens 

FEATURE : 
; OTHER INFORMATION: TADG-15 
US-09-644-600-2 



Query Match 100.0%; Score 4681; DB 4; Length 855; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 855; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
Qy 1 MGSDRARKGGGGPKDFGAGLKYNSRHEKVlJGLEEGVEFLPvWVKKVEKHGPGRWVVLAA 60 



Db 1 MGSDRARKGGGGPKDFGAGLKYNSRHEKVNGLEEGVEFLPWIW^ 60 

Qy 61 VLIGLLLVLLGIGFLVWHLQYRDVRVQKVFNGYMRITNENFVDAYENSNSTEFVSLASKV 12 0 

Db 61 VL I GLLL VLLG I GFLWHLQ YRD VRVQKVFNGYMRI TNENF VD AYENSNS TE FVSLAS KV 12 0 

Qy 121 KDALKLLYSGVPFLGPYHKESAVTAFSEGS VI AYYWSEFS I PQHLVEEAERVMAEERWM 180 

MiMMIIMM IIMII MMMMMMMIMM MIMIMI MUM II 

Db 121 KDALKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERVMAEERVVM 18 0 



Qy 181 LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 24 0 

IIIIIIIMIIIIMIIIIII IMIIMMIIIIIIIIIIIIIMIIIIIIIIIIIMII 

Db 181 LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 240 

Qy 241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 300 

Db 241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 3 00 

Qy 3 01 YNLTFHSSQNVLLITLITNTERRHPGFEATFFQLPRMSSCGGRLRKAQGTFNSPYYPGHY 360 

III Illllllll MIMMII IIMII MMMMMMMMIMMIMM I 

Db 3 01 YNLTFHSSQNVLLITLITNTERRHPGFEATFFQLPRMSSCGGRLRKAQGTFNSPYYPGHY 3 60 

Qy 361 PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 42 0 

Db 361 PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 42 0 

Qy 421 NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 480 

Db 421 NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 480 

Qy 4 81 SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 54 0 

llllhlllll! II IIMII IIMII II IIMII II IIMII II IIMII 

Db 4 81 SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 540 

Qy 541 SQQCNGKDDCGDGSDEASCPKVNWTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 

I I I I 1 I I I I I I 1 I t I I I 1 1 I 1 I I I I I 1 1 I I I I 1 I I I I I 1 I I I I I I I I I I J I I I 1 I I I I I 1 
Db 541 SQQCNGKDDCGDGSDEASCPKVNWTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 60 0 

Qy 601 DCDCGLRS FTRQAR WGGTD ADEGE WP WQVS LHALGQGH I CGAS LIS PNWL VS AAHC Y ID 660 

Db 601 DCDCGLRS FTRQAR WGGTDADEGEWP WQVS LHALGQGH I CGAS LIS PNWL VS AAHC Y ID 660 

Qy 661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 72 0 

Db 661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 72 0 

Qy 721 AE YS SMVRPI CLPDASHVFPAGKAI WVTGWGHTQ YGGTGAL ILQKGE IRVINQTTCENLL 780 

IIMII II MMMMIMM IIMII II MMIMMMIMM IMIMM 

Db 721 AE YS SMVRPI CLPDASHVFPAGKAI WVTGWGHTQ YGGTGAL ILQKGE IRVINQTTCENLL 780 



Qy 



7 81 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVYT 84 0 



Db 781 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVYT 840 



Qy 

Db 



841 RLPLFRDWIKENTGV 855 



841 RLPLFRDWIKENTGV 855 



RESULT 3 

US-09-654-600A-2 

Sequence 2, Application US/09654600A 
Patent No. 664 9741 
GENERAL INFORMATION: 
APPLICANT: O'Brien, Timothy J. 
APPLICANT: Tanimoto, Hirotoshi 

TITLE OF INVENTION: TADG-15: An Extracellular Serine Protease 
TITLE OF INVENTION: Overexpressed in Carcinomas 
FILE REFERENCE: D6064CIP/D 

CURRENT APPLICATION NUMBER: US/09/654 , 600A 
CURRENT FILING DATE: 2000-09-01 
PRIOR APPLICATION NUMBER: 09/421,213 

09/027,337 
PRIOR FILING DATE: 1999-10-20 

1998-02-20 
NUMBER OF SEQ ID NOS : 98 
SEQ ID NO 2 
LENGTH: 855 
TYPE : PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: TADG-15 
US-09-654-600A-2 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



Query Match 100.0%; Score 4681; DB 4; Length 855; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 855; Conservative 0; Mismatches 0; Indels 0; Gaps 

1 MGSDRARKGGGGPKDFGAGLKYNSRHEKVNGLEEGVEFLPWNVKKVEKHGPGRWWLAA 60 



0 



1 MGSDRARKGGGGPKDFGAGLKYNSRHEKVNGLEEGVEFLPVNNVKKVEKHGPGRWWLAA 60 
61 VLIGLLLVLLGIGFLVWHLQYRDVRVQKVFNGYMRITNENFVDAYENSNSTEFVSLASKV 12 0 



61 VLIGLLLVLLGIGFLVWHLQYRDVRVQKVFNGYMRITNENFVDAYENSNSTEFVSLASKV 12 0 
121 KDALKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERVMAEERWM 18 0 



121 KDAIiKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERVTVIAEERVVM 18 0 
181 LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 240 



181 LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 24 0 
241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 3 00 



241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 30 0 



v. 



Qy 3 01 YNLTFHSSQNVLLITLITNTERRHPGFEATFFQLPRMSSCGGRLRKAQGTFNSPYYPGHY 360 

llllllllllllll lllllllilllll IIMIIIIIII lllllllllllll MMIMM 

Db 3 01 YNLTFHS SQNVLL I TLITNTERRHPGFEATFFQLPRMS SCGGRLRKAQGTFNS PYYPGHY 360 

Qy 361 PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 420 

Db 3 61 PPNIDCTvJlIEVPNNQHVJCVSFKF 42 0 

Qy 421 NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 480 

IIIIIMIIIIMI IIIIIIIIIIIIMIMIMIII IIIMIIIIMIIIIIIIIIII 

Db 421 NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 480 

Qy 4 81 SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 54 0 

Db 481 SDELNCSCDAGHQ 540 

Qy 541 SQQCNGKDDCGDGSDEASCPKVNWTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 

IMIIIIIIMIII MIIIIMIIIIIIIMIIIIIIIIIIII IIMIIIIIIIIIIIM 

Db 541 SQQCNGKDDCGDGSDEASCPKVNWTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 

Qy 601 DCDCGLRS FTRQAR WGGTDADEGE WP WQVS LHALGQGH I CGAS LIS PNWL VS AAHC Y I D 660 

Db 601 DCDCGLRSFTRQARWGGTDADEGEWPWQVSLHALGQGHICGASLISPNWLVSAAHCYID 660 

Qy 661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 720 

Mill II MM II Mill MMMIIMI MMMMMMMIMMMIMM 

Db 661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 720 

Qy 721 AEYSSMVRPICLPDASHVFPAGKAIWVTGWGHTQYGGTGALILQKGEIRVINQTTCENLL 780 

I I M I I I II I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I 
Db 721 AEYSSMVRPICLPDASHVFPAGKAIWVTGWGHTQYGGTGALILQKGEIRVINQTTCENLL 780 

Qy 7 81 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVYT 84 0 

MMMMIMIMM MMMIMMIMM Illllllll II MIMI Mill 

Db 781 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVYT 84 0 

Qy 841 RLPLFRDW I KENTGV 855 

Db 841 RLPLFRDW I KENTGV 855 



RESULT 1 
JC7731 

membrane -bound arginine- specif ic serine proteinase precursor - rat 
C;Species: Rattus norvegicus (Norway rat) 

C;Date: 14-Dec-2001 #sequence_revision 14-Dec-2001 #text_change 03-Feb-2003 
C;Accession: JC7731; JC7775 

R;Kishi, K. ; Yamazaki, K. ; Yasuda, I.; Yahagi, N. ; Ichinose, M. ; Tsuchiya, Y. ; 
Athauda, S.B.P.; Inoue, H.; Takahashi, K. 
J. Biochem. 130, 425-430, 2001 

A/Title : Characterization of a membrane -bound arginine-specif ic serine protease 
from rat intestinal mucosa. 

A/Reference number: JC7731; MUID : 21421307 ; PMID : 1153 0019 

A/Accession: JC7731 

A; Molecule type: mRNA 

A;Residues: 1-855 <KIS> 

A/Cross-references : DDBJ : AB049189 

A; Experimental source: strain Male, 7 -week-old 

R;Satomi, S.; Yamasaki, Y . ; Tsuzuki, S.; Hitomi, Y. ; Iwanaga, T.; Fushiki, T. 
Biochem. Biophys . Res. Commun. 287, 995-1002, 2001 

A;Title: A role for membrane-type serine protease (MT-SP1) in intestinal 
epithelial turnover. 

A;Reference number: JC7775; PMID : 11573963 

A; Contents: Small intestine 

A; Accession: JC7775 

A; Molecule type: mRNA 

A; Residues: 1-855 <SAT> 

A;Cross-ref erences : DDBJ : AB037898 

C; Comment: This enzyme, an epithelial -derived, type II integral membrane serine 
protease. It localized mainly on brushborder membranes of the intestine and 
participates in the processing or digestion of specific proteins or peptides on 
the brushborder membranes. It also participates in the control of intestinal 
epithelial turnover by regulating the cell -substratum adhesion associated with 
epithelial migration and/or cell loss. 
C; Genetics : 
A; Gene: mt-spl 

A ; Map position: basolateral cell surface 

C;Superf amily : membrane -bound arginine-specif ic serine proteinase 
C; Keywords: protein digestion 

Query Match 83.0%; Score 3883; DB 2; Length 855; 

Best Local Similarity 81.1%; Pred. No. 4.5e-247; 

Matches 693; Conservative 79; Mismatches 83; Indels 0; Gaps 0; 



Qy 


i 


MGSDRARKGGGGPKDFGAGLKYNSRHEKWGLEEGVEFLPV1S1NVKKVEKHGPGRWW 


60 


Db 


i 


||::| II III MINIMUM 1 Ml I 1 1 1 - 1 1 ' 1 1 1 IMM II III : 1 
MGNNRGRKAGGGSQDFGAGLKYNSRLENMNGFEEGVEFLPVNNAKQVEKRGPRRWVVMVA 


60 


Qy 


61 


VLIGLLLVLLGIGFLVWHLQYRDVRVQKVFNGYMRITNENFVDAYENSNSTEFVSLASKV 

h lh 1 1 MM IIMIMIIMhMIIIIMMIIIII MIIMIMM 


120 


Db 


61 


WFSFLLLSLMAGLLVWHFHYRNVRIQKVFNGHLRITNENFLDAYENSTSTEFISLASQV 


120 


Qy 


121 


KDALKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERVMAEERVVM 


180 


Db 


121 


IMIIIMI M IMIMM Ml MINIMUM II II M II MM 

KEALKLMYSEVPVLGPYHKKSTVTAFSEGSVIAYYWSEFSIPPHLEEEVDRAMAVERWT 


180 


Qy 


181 


LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 


240 



Db 181 LPPRARALKSFVLTSWAFPIDPRMLQRTQDNSCSFALHARGRTVTRFTTPGFPNSPYPA 240 

Qy 241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 3 00 

llllll I I I 1 I I M I I I I I I I I I = I II |UII-i lllllhh Mh || 

Db 241 HARCQWVLRGDADSVLSLTFRSFDVAPCDGHDSDLVTVYDSLSPMEPHAWRLCGTFSPS 3 00 

Qy 301 YNLTFHSSQNVLLITLITNTERRHPGFEATFFQLPRMSSCGGRLRKAQGTFNSPYYPGHY 360 

Mill Mill Ml 1 1 h h 1 1 1 1 1 1 h 1 1 1 hi 1 1 1 II I Mlllhllllllll 

Db 301 YNLTFLSSQNVFLVTLITNTDRRHPGFEATFFQLPKMSSCGGLLSEAQGTFSSPYYPGHY 360 

Qy 361 PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 420 

! I hi I I I h. I I I -I I | | II- M hi h I I I I I i I h I I I I I I I I h I 
D b 361 PPNINCTWNIKVPNNRNVKVRFKLFYLVDPNIPVGSCTKDYVEINGEKFCGERSQFVVSS 420 

Qy 421 NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 480 

I h 1 1 1 1 MM illllllhllllil hill I ; h 1 1 hlhh I II I II h 

Db 421 NSSKITVHFHSDHSYTDTGFLAEYLSYDSNDPCPGMFMCKTGRCIRKDLRCDGWADCPDY 480 

Qy 4 81 SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 540 

Ml h hi III I I I : I I I I i ' I I I I 1 I : I I 1 I 111 = 111111 h: hill : 
Db 481 SDERHCRCNATHQFMCKNQFCKPLFWVCDSVNDCGDGSDEEGCSCPAGSFKCSNGKCLPQ 54 0 

Qy 541 SQQCNGKDDCGDGSDEASCPKVNVVTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 

1 1 1 1 1 1 M M 1 1 1 1 1 II II II hllhllll 1 1 1 1 1 = L 1 1 1 1 1 1 1 1 = 1 1 1 1 1 1 1 1 1 

Db 541 SQQCNGKDDCGDGSDEASCDNVNAVSCTKYTYRCQMGLCLNKGNPECDGKKDCSDGSDEK 600 

Qy 601 DCDCGLRSFTRQARWGGTDADEGEWPWQVSLHALGQGHICGASLISPNWLVSAAHCYID 660 

hlhh hhlhlhhhlhh hhh llllhlll hlhlhl Ih I 

Db 601 NCDCGLRSFTKQARWGGTNADEGEWPWQVSLHALGQGHLCGASLISPDWLVSAAHCFQD 660 

Qy 661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 720 

Db 661 ETIFKYSDHTMWTAFLGLLDQSKRSASGVQEHKLKRIITHPSFNDFTFDYDIALLELEKP 720 

Qy 721 AE YS SMVRP I CLPDASHVF PAGKAI WVTGWGHTQ YGGTGAL I LQKGE I RVINQTTCENLL 780 

I I I h h i I 1 1 1 1 MIIIIMIIIIIIIIIh II lllllllll llhlllh M 

Db 721 AEYSTWRPICLPDNTHVFPAGKAIWVTGWGHTKEGGTGALILQKGEIRVINQTTCEELL 780 

Qy 781 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVYT 840 

1 1 1 1 1 II I II I II I M 1 1 1 1 II I M 1 1 II II II I II I h 1 1 II 1 1 h I h II 1 1 II I 

Db 7 81 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEKDGRIFQAGWSWGEGCAQRNKPGVYT 84 0 

Qy 841 RLPLFRDWIKENTGV 855 

Db 841 RIPEVRDWIKEQTGV 855 



Database : SPTREMBL 25:* 



1: 




sp_archea : * 


2 : 




sp_bacteria : * 


3 : 




sp_f ungi : * 


4 : 




sp human : * 


5 : 




sp invertebrate:* 


6; 




sp_mammal : * 


7: 




sp_mhc : * 


8: 




sp organelle : * 


9: 




sp _phage : * 


10 : 


sp_plant : * 


11: 


sp_rodent : * 


12 : 


sp_virus : * 


13 ■ 


: sp_vertebrate : * 


14 


: sp unclassified:* 


15 


: sp_rvirus:* 


16 


: sp_bacteriap : * 


17 


: sp archeap:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


3883 


83 


.0 


855 


11 


Q9JJI7 


Q9jji7 rattus norv 


2 


2664 


56 


.9 


845 


13 


Q9DGR1 


Q9dgrl xenopus lae 


3 


2379 


50 


.8 


422 


4 


Q8WVC1 


Q8wvcl homo sapien 


4 


1011.5 


21 


.6 


572 


11 


Q8BIK6 


Q8bik6 mus musculu 


5 


717 . 5 


15 


.3 


855 


4 


Q7Z410 


Q7z410 homo sapien 


6 


717 . 5 


15 


.3 


1059 


4 


Q7Z411 


Q7z411 homo sapien 


7 


690.5 


14 


.8 


1111 


11 


Q80YN4 


Q8 0yn4 rattus norv 


8 


687 


14 


. 7 


777 


11 


Q8CAN9 


Q8can9 mus musculu 


9 


644 . 5 


13 


.8 


767 


13 


Q9DGR2 


Q9dgr2 xenopus lae 


10 


636.5 


13 


.6 


680 


5 


Q868H7 


Q868h7 branchiosto 


11 


623 .5 


13 


.3 


680 


5 


Q868H5 


Q868h5 branchiosto 


12 


617 


13 


.2 


581 


5 


Q9XZM7 


Q9xzm7 strongyloce 


13 


612 


13 


.1 


688 


5 


Q868H6 


Q868h6 branchiosto 


14 


601 


12 


.8 


490 


11 


Q7TN04 


Q7tn04 mus musculu 


15 


600 


12 


.8 


490 


11 


Q920K3 


Q92 0k3 rattus norv 



i 



Database : Published_Applications_AA: * 

1 : /cgn2_6/ptodata/2/pubpaa/US07_PUBCOMB . pep : * 
2 : /cgn2_6/ptodata/2/pubpaa/PCT_NEW_PUB.pep: * 
3 : / cgn2_6 /p todat a / 2 /pubpaa/US 0 6 JSTEW_PUB . pep : * 
4 : /cgn2_6/ptodata/2/pubpaa/US06_PUBCOMB .pep : * 
5 : /cgn2_6/ptodata/2/pubpaa/US07_NEW_PUB .pep : * 
6 : /cgn2_6/ptodata/2/pubpaa/PCTUS_PUBCOMB .pep : * 
7 : /cgn2_6/ptodata/2/pubpaa/US08_NEW_PUB . pep ; * 
8 : /cgn2_6/ptodata/2/pubpaa/US08_PUBCOMB.pep : * 
9 : /cgn2_6/ptodata/2/pubpaa/US09A__PUBCOMB .pep : * 
10 : /cgn2_6/ptodata/2/pubpaa/US09B_PUBCOMB.pep:* 
11 : /cgn2_6/ptodata/2/pubpaa/US09C_PUBCOMB .pep : * 
12 : / cgn2_6/p todat a/ 2 /pubpaa/US 0 9_NEW_PUB. pep: * 
13 : /cgn2_6/ptodata/2/pubpaa/US10A_PUBCOMB.pep:* 
14 : /cgn2_6/ptodata/2/pubpaa/US10B_PUBCOMB.pep:* 
15 : /cgn2_6/ptodata/2/pubpaa/US10C_PUBCOMB.pep: * 
16 : /cgn2_6/ptodata/2/pubpaa/US10_NEW_PUB.pep: * 
17 : /cgn2_6/ptodata/2/pubpaa/US60_NEW_PUB.pep: * 
18 : /cgn2_6/ptodata/2/pubpaa/US60_PUBCOMB . pep : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 



Result 
No. 


Score 


Query 
Match 


Length 


DB 


ID 








Description 


1 


4681 


100 


.0 


855 


10 


US 


-09 


-776 


-191-2 


Sequence 


2 f Appli 


2 


4681 


100 


.0 


855 


12 


US- 


-10 


-072 


-012-352 


Sequence 


352, App 


3 


4681 


100 


. 0 


855 


12 


US 


-10 


-072 


-012-411 


Sequence 


411, App 


4 


4681 


100 


.0 


855 


12 


US 


-10 


-072 


-012-418 


Sequence 


418, App 


5 


4681 


100 


.0 


855 


14 


us 


-10 


-099 


-700A-2 


Sequence 


2, Appli 


6 


4681 


100 


.0 


855 


14 


us 


-10 


-190 


-030B-2 


Sequence 


2, Appli 


7 


4681 


100 


. 0 


855 


14 


us 


-10 


-302 


-840A-2 


Sequence 


2, Appli 


8 


4681 


100 


.0 


855 


14 


us 


-10 


-267 


-219-2 


Sequence 


2, Appli 


9 


4681 


100 


.0 


855 


14 


us 


-10 


-112 


-221A-2 


Sequence 


2, Appli 


10 


4681 


100 


.0 


855 


14 


us 


-10 


-104 


-271-2 


Sequence 


2, Appli 


11 


4681 


100 


.0 


855 


15 


us 


-10 


-147 


-211A-2 


Sequence 


2, Appli 


12 


4681 


100 


.0 


855 


15 


us 


-10 


-156 


-214A-2 


Sequence 


2, Appli 


13 


4681 


100 


.0 


855 


16 


us 


-10 


-600 


-187-2 


Sequence 


2, Appli 


14 


4676 


99 


.9 


855 


12 


us 


-10 


-072 


-012-353 


Sequence 


353, App 


15 


4676 


99 


.9 


855 


12 


us 


-10 


-072 


-012-412 


Sequence 


412, App 


16 


4676 


99 


.9 


855 


12 


us 


-10 


-072 


-012-419 


Sequence 


419, App 


17 


4676 


99 


.9 


855 


15 


us 


-10 


-295 


-027-1185 


Sequence 


1185, Ap 


18 


4672 


99 


. 8 


855 


12 


us 


-10 


-072 


-012-354 


Sequence 


354, App 


19 


4672 


99 


.8 


855 


12 


us 


-10 


-072 


-012-420 


Sequence 


42 0, App 


20 


4672 


99 


. 8 


855 


12 


us 


-10 


-037 


-417-132 


Sequence 


132, App 


21 


4631 


98 


. 9 


851 


12 


us 


-10 


-276 


-774-1798 


Sequence 


1798, Ap 


22 


4631 


98 


. 9 


851 


12 


us 


-10 


-296 


-115-1143 


Sequence 


1143, Ap 


23 


4175.5 


89 


.2 


782 


14 


us 


-10 


-097 


-340-312 


Sequence 


312, App 


24 


4175 


89 


.2 


762 


16 


us 


-10 


-729 


-807-1 


Sequence 


1, Appli 


25 


4111 


87 


.8 


757 


12 


us 


-10 


-072 


-012-44 


Sequence 


44, Appl 


26 


3901 


83 


.3 


855 


9 


us- 


09- 


900- 


751-2 


Sequence . 


2, Appli 


27 


3901 


83 


.3 


855 


12 


us 


-10 


-072 


-012-355 


Sequence 


355, App 


28 


3901 


83 


.3 


855 


12 


us 


-10 


-072 


-012-413 


Sequence 


413, App 



29 


3883 


83 


.0 


855 


12 


US 


-10 


-072 


-012-356 


Sequence 


356, 


App 


30 


3883 


83 


. 0 


855 


12 


US 


-10 


-072 


-012-414 


Sequence 


414 , 


App 


31 


3883 


83 


.0 


855 


12 


us 


-10 


-072 


-012-417 


Sequence 


417, 


App 


32 


3810 


81 


.4 


902 


12 


us 


-10 


-333 


-743-3 


Sequence 


3, Appli 


33 


3810 


81 


.4 


902 


16 


us 


-10 


-600 


-187-10 


Sequence 


10, 


Appl 


34 


3810 


81 


.4 


902 


16 


us 


-10 


-297 


-987B-11 


Sequence 


11, 


Appl 


35 


2980 


63 


.7 


620 


9 


us- 


09- 


925- 


301-1193 


Sequence 


1193, 


Ap 


36 


2664 


56 


. 9 


845 


12 


us 


-10 


-072 


-012-415 


Sequence 


415, 


App 


37 


1319 


28 


.2 


241 


10 


us 


-09 


-776 


-191-50 


Sequence 


50, 


Appl 


38 


1319 


28 


.2 


241 


14 


us 


-10 


-099 


-700A-4 


Sequence 







Database : SwissProt_42 : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 



o. 


Score 


Match 


Length 


DB 


ID 


Description 


1 


4676 


99 


. 9 


855 


1 


ST14_HUMAN 


Q9y5y6 


homo sapien 


2 


3901 


83 


.3 


855 


1 


ST14_M0USE 


P56677 


mus musculu 


3 


1124.5 


24 


. 0 


811 


1 


TMS6_MOUSE 


Q9dbi0 


mus musculu 


4 


1124 


24 


. 0 


811 


1 


TMS 6_HUMAN 


Q8iu80 


homo sapien 


5 


727 


15 


. 5 


1034 


1 


ENTK_PIG 


P98074 


sus scrofa 


6 


712 .5 


15 


. 2 


1035 


1 


ENTK_BOVIN 


P98072 


bos taurus 


7 


692 


14 


.8 


1042 


1 


CORI_HUMAN 


Q9y5q5 


homo sapien 


8 


682.5 


14 


.6 


1019 


1 


ENT K_HUMAN 


P98073 


homo sapien 


9 


676.5 


14 


.5 


1069 


1 


ENTK_MOUSE 


P97435 


mus musculu 


10 


663.5 


14 


.2 


1113 


1 


CORI_MOUSE 


Q9z319 


mus musculu 


11 


600 


12 


. 8 


490 


1 


TMS2_M0USE 


Q9jiq8 


mus musculu 


12 


588 


12 


.6 


422 


1 


DES1_HUMAN 


Q9ul52 


homo sapien 


13 


586.5 


12 


.5 


704 


1 


CRAR_MOUSE 


P98064 


mus musculu 


14 


574 


12 


.3 


699 


1 


CRAR_HUMAN 


P48740 


h complemen 


15 


558.5 


11 


. 9 


492 


1 


TMS2_HUMAN 


015393 


homo sapien 


16 


546 


11 


. 7 


453 


1 


TMS3_M0USE 


Q8klt0 


mus musculu 


17 


533 .5 


11 


.4 


638 


1 


KAL_M0USE 


P26262 


mus musculu 


18 


533 


11 


.4 


454 


1 


TMS3_HUMAN 


» P57727 


homo sapien 


19 


518 


11 


. 1 


603 


1 


CFAI_MOUSE 


Q61129 


mus musculu 


20 


518 


11 


. 1 


604 


1 


CFAI_RAT 


Q9wuw3 


rattus norv 


21 


514 . 5 


11 


. 0 


638 


1 


KAL_RAT 


P14272 


rattus norv 


22 


513 


11 


. 0 


455 


1 


TMS5_M0USE 


Q9er04 


mus musculu 


23 


511.5 


10 


. 9 


418 


1 


HATT HUMAN 


060235 


homo sapien 



Database : 



A_Geneseq_2 9Jan04 : * 

1 : geneseqpl980s : * 

2: geneseqpl990s : * 

3: geneseqp2000s : * 

4: geneseqp2 0 01s : * 

5: geneseqp2002s : * 

6: geneseqp2003as : * 

7 : geneseqp2003bs : * 

8: geneseqp2004s : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


4681 


100 


. 0 


855 


2 


AAY06671 


Aay06671 


Tumour an 


2 


4681 


100 


.0 


855 


4 


AAB98500 


Aab98500 


Human TAD 


3 


4681 


100 


.0 


855 


4 


AAE06930 


Aae06930 


Human mem 


4 


4681 


100 


. 0 


855 


5 


AA022 92 9 


Aao22 92 9 


Type II t 


5 


4681 


100 


.0 


855 


6 


ABP56619 


Abp56619 


Human mem 


6 


4681 


100 


.0 


855 


6 


AAO3 0146 


Aao3 0146 


Human mem 


7 


4681 


100 


.0 


855 


6 


AAE29820 


Aae29820 


Human mem 


8 


4681 


100 


.0 


855 


6 


AAE29791 


Aae29791 


Human mem 


9 


4681 


100 


. 0 


855 


6 


ABP72376 


Abp72376 


Transmemb 


10 


4681 


100 


. 0 


855 


7 


ADB97551 


Adb97551 


Human MTS 


11 


4676 


99 


. 9 


855 


3 


AAB19552 


Aabl9552 


Human mat 


12 


4676 


99 


. 9 


855 


4 


AAB35465 


Aab35465 


Human mem 


13 


4631 


98 


. 9 


851 


4 


AAM25628 


Aam2562 8 


Human pro 


14 


4631 


98 


. 9 


851 


4 


ABB11428 


Abbll428 


Human mem 


15 


4319 


92 


.3 


932 


4 


ABG21442 


Abg21442 


Novel hum 


16 


4175.5 


89 


.2 


782 


5 


ABG96427 


Abg96427 


Human ova 


17 


4175 


89 


. 2 


762 


3 


AAY90284 


Aay90284 


Human pep 


18 


3901 


83 


.3 


855 


5 


AAE23083 


Aae23083 


Epithin p 


19 


3810 


81 


.4 


902 


4 


AAB98507 


Aab98507 


Murine ep 


20 


3810 


81 


.4 


902 


5 


AAU80517 


Aau80517 


Mouse epi 


21 


3810 


81 


.4 


902 


5 


AAU77549 


Aau77549 


Murine ty 


22 


3781 


80 


. 8 


683 


3 


AAB19551 


Aabl9551 


Human mat 


23 


2980 


63 


.7 


620 


3 


AAB43748 


Aab43748 


Human can 


24 


1352 


28 


.9 


362 


4 


ABG21441 


Abg21441 


Novel hum 



Database : PIR_78 : * 

1: pirl:* 
2: pir2:* 
3: pir3:* 
4: pir4:* 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 



No. 


Score 


Match 


Length 


DB 


ID 


Description 


1 


3883 


83 


.0 


855 


2 


JC7731 


membrane -bound arg 


2 


727 


15 


. 5 


1034 


1 


A53663 


enteropeptidase (E 


3 


712.5 


15 


.2 


1035 


1 


A43090 


enteropeptidase (E 


4 


682.5 


14 


.6 


1019 


1 


A56318 


enteropeptidase (E 


5 


663.5 


14 


.2 


1113 


2 


JE0315 


low-density lipopr 


6 


578 . 5 


12 


.4 


1524 


2 


T30337 


polyprotein - Afri 


7 


574 


12 


. 3 


699 


1 


154763 


Ra-reactive factor 


8 


533 .5 


11 


. 4 


638 


1 


KQMSPL 


plasma kallikrein 


9 


514.5 


11 


.0 


638 


1 


KQRTPL 


plasma kallikrein 


10 


509.5 


10 


. 9 


790 


1 


PLPG 


plasmin (EC 3.4.21 


11 


506 


10 


. 8 


613 


2 


S15468 


complement C3b/C4b 


12 


502 


10 


.7 


460 


2 


B61545 


plasmin (EC 3.4.21 


13 


501. 5 


10 


.7 


786 


1 


A47547 


serine proteinease 


14 


500 


10 


. 7 


638 


1 


KQHUP 


plasma kallikrein 


15 


497 


10 


.6 


810 


1 


PLHU 


plasmin (EC 3.4.21 


16 


492 .5 


10 


. 5 


583 


2 


A29154 


complement factor 


17 


491.5 


10 


. 5 


812 


1 


PLMS 


plasmin (EC 3.4.21 


18 


491 


10 


.5 


416 


1 


KFBO 


coagulation factor 


19 


490.5 


10 


.5 


812 


1 


PLBO 


plasmin (EC 3.4.21 


20 


490 


10 


.5 


417 


1 


S00845 


hepsin (EC 3.4.21 



RESULT 1 
Q9JJI7 

ID Q9JJI7 PRELIMINARY; PRT; 855 AA. 

AC Q9JJI7; 

DT 01-OCT-2000 (TrEMBLrel . 15, Created) 

DT 01-OCT-2000 (TrEMBLrel. 15, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Membrane bound serine protease (Membrane bound arginine specific 

DE serine protease) . 

GN MBSP . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N. A. 

RC STRAIN=WISTAR; TISSUE= Jejunum; 

RA Tsuzuki S . ; 

RT "A membrane bound serine protease expressed in rat small intestine." 

RL Submitted (JAN-2000) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP SEQUENCE FROM N. A. 

RC STRAIN=Wistar; TISSUE=Duodenum; 

RA Inoue H., Takahashi K. , Kishi K. ; 

RT "membrane -bound arginine specific serine protease."; 

RL Submitted (SEP-2000) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO PEPTIDASE FAMILY SI. 

CC -!- SIMILARITY: CONTAINS 2 CUB DOMAINS. 

DR EMBL; AB037898; BAB03502.1; -. 

DR EMBL; AB049189; BAB13765.1; -. 

DR PIR; JC7731; JC7731. 

DR HSSP; P00763; 1DPO . 

DR MEROPS; SO 1.3 02; -. 

DR GO; GO: 0004263; F : chymotrypsin activity; IEA. 

DR GO; GO: 0008233; F:peptidase activity; IEA, 

DR GO; GO: 0004295; F: trypsin activity; IEA. 

DR GO; GO: 0006508; P : proteolysis and peptidolysis ; IEA. 

DR InterPro; IPR000859; CUB. 

DR InterPro; IPR009003; Cys_Ser_trypsin . 

DR InterPro; IPR002172; LDL_receptor_A. 

DR InterPro; IPR001254; Peptidase_Sl . 

DR InterPro; IPR001314; Peptidase_SlA . 

DR Pfam; PF00431; CUB; 2. 

DR Pfam; PF00057; ldl_recept_a ; 4. 

DR Pfam; PF00089; trypsin; 1. 

DR PRINTS; PR00722; CHYMOTRYPSIN. 

DR PRINTS; PRO 02 61; LDLRECEPTOR. 

DR SMART; SM00042; CUB; 2. 

DR SMART; SM00192; LDLa ; 3. 

DR SMART; SM00020; Tryp_SPC; 1. 

DR PROSITE; PS01180; CUB; 2. 

DR PROSITE; PS01209; LDLRA_1 ; 2. 

DR PROSITE; PS50068; LDLRA_2 ; 4. 

DR PROSITE; PS5024 0; TRYPS IN_DOM ; 1. 

DR PROSITE; PS00134; TRYPSIN_HIS; 1. 

DR PROSITE; PS00135; TRYPS IN_SER; 1. 

KW Hydrolase; Protease; Serine protease. 




FT 
SQ 


VARIANT 
SEQUENCE 


665 665 K -> N. 
855 AA; 94955 MW; 35806B7ECF6CF03D CRC64 ; 




Query Match 83.0%; Score 3883; DB 11; Length 855; 
Best Local Similarity 81.1%; Pred. No. 0; 

Matches 693; Conservative 79; Mismatches 83; Indels 0; Gaps 


0; 


Qy 


1 


MGSDRARKGGGGPKDFGAGLKYNSRHEKVNGLEEGVEFLPVW^KKVEKHGPGRWVVLAA 


60 


Db 


1 


|::| || Ml :||lllllllll 1 : 1 1 MIIIMIMI hill 1 1 1111= 1 

MGNNRGRKAGGGSQDFGAGLKYNSRLENMNGFEEGVEFLPVNNAKQVEKRGPRRWVVMVA 


60 


Qy 


61 


VL I GLLL VLLG I GFL WHLQYRD WVQKVFNG YMR I TNENF VDAYENSNSTE FVS LAb 


120 


Db 


61 


h lh 1 1 MM IMMII! M-IMMIMI MM IIIIMMhl 

WF S FLLL S LMAGLLWHFHYRNVR I Q KVFNGHLR I TNENFLDAYENS TSTEFIb LAb Q V 


120 


Qy 


121 


KDALKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERVT4AEERWM 


180 


Db 


121 


MIMMM II IMIIIM MMMMIMIMIIMM II II M II MM 

. _ _______ __i _ _ _ _ __ — «_ . — -r -p — . — *— — — ^ ■ i ■ • /-^i — | r* "f~ k tt iT TT 1 — 1 1 — 1 TTT f 1 \T~ I TV H R TV T 7 I. 1 1 ] T / \ T ' 1 * 

KEALKLMYSEVPVLGPYHKKSTVTAFSEGSVIAYYWSEFSIPPHLEEEVTlRAjyiAVERVVT 


180 


Qy 


181 


LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 


240 


Db 


181 


1 1 1 1 1 1 M II 1 h 1 1 1 II 1 1 1 : MIIMIMM Mill : Mlllllhlllll 

LPPRARALKSFVLTSWAFPIDPRMLQRTQDNSCSFALHARGRTVTRFTTPGFPNSPYPA 


240 


Qy 


241 


HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCO I Y PPb 


300 


Db 


241 


HARCQWVLRGDADSVLSLTFRSFDVAPCDGHDSDLVTVYDSLSPMEPHAWRLCGTFSPS 


300 


Qy 


301 


YNLTFHSSQNVLLITLITNTERRHPGFEATFFQLPRMSSCGGRLRKAQGTFNSPYYPGHY 

Mill Mill hlllMIMIIIIIIIIIIIIhllllll 1 Mill hill III 1 1 


360 


Db 


301 


YNLTFLSSQNVFLVTLITNTDRRHPGFEATFFQLPKMSSCGGLLSEAQGTFSSPYYPGHY 


360 


Qy 


361 


PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFVVlb 


420 


Db 


361 


1 1 1 1 M 1 1 1 1 : 1 1 1 h M 1 1 || MhM = 1 hi MM 1 Mi Ml II 1 II III M 

PPNINCTWNIKVPNNRNVKvT^FKLFYLVI)PNIPVGSCTKDYVEINGEKFCGERSQFV 


420 


Qy 


421 


NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 


480 


Db 


421 


NSSKITVHFHSDHSYTDTGFLAEYLSYDSNDPCPGMFMCKTGRCIRKDLRCDGWADCPDY 


480 


Qy 


481 


SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 


540 


Db 


481 


Ml : I hi III illMMMIMI lllillll MMIIIMI MMIIIMI : 

SDERHCRCNATHQFMCKNQFCKPLFWVCDSVNDCGDGSDEEGCSCPAGSFKCSNGKCLPQ 


540 


Qy 


541 


SQQCNGKDDCGDGSDEASCPKVw VVTCl lUi 1 YR_LJn 

1 1 M 1 1 1 1 1 1 11 II 1 M 1 1 II hllhllll llllhllllllllhlllllllll 


600 


Db 


541 


SQQCNGKDDCGDGSDEASCDNVNAVSCTKYTYRCQNGLCLNKGNPECDGKKDCSDGSDEK 


600 


Qy 


601 


DCDCGLRSFTRQARWGGTDADEGEWPWQVSLHALGQGHICGASLISPNWLVSAAHCYID 
:|||||||||:|'lllllhlllili IMIIIM IhlMIM hlMMII : 1 


660 


Db 


601 


NCDCGLRSFTKQARWGGTNADEGEWPWQVSLHALGQGHLCGASLISPDWLVSAAHCFQD 


660 


Qy 


661 


DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 

: hill 1 IMMII llhlll MM =11111= 1 MIMM Mill III 


720 


Db 


661 


ETIFKYSDHTMWTAFLGLLDQSKRSASGVQEHKLKRIITHPSFNDFTFDYDIALLELEKP 


720 


Qy 


721 


AEYSSMVRPICLPDASHVFPAGKAIWVTGWGHTQYGGTGALILQKGEIRVINQTTCENLL 
Mil.. 1 • 1 1 1 1 1 1 1 1 I 1 1 1 1 1 I 1 1 • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 


780 



Db 721 AEYSTWRPICLPDNTHVFPAGKAIWVTGWGHTKEGGTGALILQKGEIRVINQTTCEELL 780 

Qy 781 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVYT 840 

II 1 : 1 1 II IT IIMI I II I 1 1 li II; II III IMIII h'llllll INI 

Db 781 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEKDGRIFQAGWSWGEGCAQRNKPGVYT 840 

Qy 841 RL PL FRD W I KENTG V 855 

hi MINI III 

Db 841 RIPEVRDWIKEQTGV 855 



RESULT 2 
ST14JVI0USE 

ID ST14_M0USE STANDARD; PRT; 855 AA. 

AC P56677; 

DT 15-JUL-1999 (Rel. 38, Created) 

DT 16-OCT-2001 (Rel. 40, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Suppressor of tumorigenicity 14 (EC 3.4.21.-) (Epithin) . 

GN ST14 OR PRSS14. 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCB I__Tax I D - 1 0 0 9 0 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C.B. 17SCID; TISSUE=Thymus ; 

RX MEDLINE=99216440; PubMed=10199918 ; 

RA Kim M.G., Chen C, Lyu M.S., Cho E.G., Park D. f Kozak C, 

RA Schwartz R.H.; 

RT "Cloning and chromosomal mapping of a gene isolated from thymic 

RT stromal cells encoding a new mouse type II membrane serine protease, 

RT epithin, containing four LDL receptor modules and two CUB domains."; 

RL Immunogenetics 49:420-428(1999). 

RN [2] 

RP REVISIONS TO 23; 321; 325; 343; 409-410 AND C-TERMINUS. 

RC STRAIN=C.B. 17SCID; TISSUE=Thymus ; 

RA Kim M.G., Chen C. , Cho E.G., Park D., Schwartz R.H. ; 

RL Submitted (MAR-2 0 00) to the EMBL/ GenBank/DDB J databases. 

RN [3] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Breast tumor; 

RX MEDLINE=22388257; PubMed=12477932 ; 

RA Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L., Shenmen CM., Schuler G.D., 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H. , Moore T., Max S.I., Wang J., Hsieh F . , 

RA Diatchenko L . , Marusina K. , Farmer A. A. , Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E. 

RA Browns tein M.J., Usdin T.B., Toshiyuki S., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A. , Peters G.J., Abramson R.D., Mullahy S.J. 

RA Bosak S.A., McEwan P.J., McKernan K. J. , Malek J. A. , Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L.J., Hulyk S.W. 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A., 

RA Fahey J., Helton E., Ketteman M. , Madan A., Rodrigues S. , Sanchez A. 

RA Whiting M. , Madan A., Young A.C., Shevchenko Y. , Bouffard G.G. , 

RA Blakesley R.W., Touchman J.W. , Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J., Myers R.M., 

RA Butterfield Y.S.N. , Krzywinski M.I., Skalska U. , Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A. ; 

RT "Generation and initial analysis of more than 15,000 full-length 

RT human and mouse cDNA sequences."; 

RL Proc. Natl. Acad. Sci . U.S.A. 99:16899-16903(2 002). 

CC -!- SUBCELLULAR LOCATION: Type II membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Highly expressed in intestine, kidney, lung, 

CC and thymus. Not expressed in skeletal muscle, liver, heart, 

CC testis and brain. 

CC -!- SIMILARITY: Belongs to peptidase family SI. 




cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
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DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
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KW 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 



-!- SIMILARITY: Contains 2 CUB domains. 

-!- SIMILARITY: Contains 4 LDL-receptor class A domains. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 
the European Bioinf ormatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib . ch) . 

EMBL; AF042822; AAD02230.3; -. 

EMBL; BC005496; AAH05496.1; -. 

HSSP; P20231; 1AA0 . 

MEROPS; SOI. 302; -. 

MGD; MGI: 1338881; Stl4 . 

GO; GO: 0005576; C : extracellular ; IDA. 

GO; GO: 0019897; C: extrinsic to plasma membrane; IDA. 

GO; GO: 0008236; F : serine- type peptidase activity; IDA. 



IPR000859 
IPR009003 
IPR002172 
IPR001254 
IPR001314 
CUB; 



InterPro; 
InterPro; 
InterPro; 
InterPro; 
InterPro ; 
Pfam; PF00431 
Pfam; PF00057 
Pfam; PF00089 
PRINTS; PRO 072 2; 
PRINTS; PR002 61; 
SMART; SM0 0042; 
SMART; SM00192; 
SMART; SM0002 0; 
PROSITE; PS01180 



PROSITE; 
PROSITE; 
PROSITE; 
PROSITE; 
PROSITE; 



PS01209 
PS50068 
PS50240 
PS00134 
PS00135 



CUB. 

Cys_Ser_trypsin . 
LDL_recep t or_A . 
Peptidase_Sl . 
Peptidase_SlA. 
2 . 

ldl_recept_a; 4. 
trypsin; 1. 
; CHYMOTRYPS IN . 
; LDLRECEPTOR. 
CUB; 2. 
LDLa ; 4 . 
Tryp_SPc; 1. 
CUB; 2. 
LDLRA_1 ; 2 . 
LDLRA_2 ; 4 . 
TRYPS IN_DOM ; 
TRYPSIN HIS; 



1. 
1. 
1. 



TRYPS IN_SER; 

Signal-anchor; Glycoprotein; Hydrolase; Serine protease; 
Transmembrane; Repeat. 



DOMAIN 


1 


55 


CYTOPLASMIC ( POTENTIAL ) 


* 


TRANSMEM 


56 


76 


SIGNAL -ANCHOR (TYPE- I I ] 


MEMBRANE PROTEIN) 








(POTENTIAL) . 




DOMAIN 


77 


855 


EXTRACELLULAR (POTENTIAL) . 


DOMAIN 


214 


331 


CUB 1. 




DOMAIN 


340 


444 


CUB 2 . 




DOMAIN 


451 


488 


LDL-RECEPTOR CLASS A 1. 




DOMAIN 


489 


522 


LDL-RECEPTOR CLASS A 2. 




DOMAIN 


523 


561 


LDL-RECEPTOR CLASS A 3 . 




DOMAIN 


565 


604 


LDL-RECEPTOR CLASS A 4. 




DOMAIN 


615 


854 


SERINE PROTEASE. 




ACT_SITE 


656 


656 


CHARGE RELAY SYSTEM (BY 


SIMILARITY) . 


ACT_SITE 


711 


711 


CHARGE RELAY SYSTEM (BY 


SIMILARITY) . 


ACT_SITE 


805 


805 


CHARGE RELAY SYSTEM (BY 


SIMILARITY) . 


CARBOHYD 


107 


107 


N-LINKED (GLCNAC. . .) 


(POTENTIAL) . 


CARBOHYD 


302 


302 


N-LINKED (GLCNAC. . .) 


(POTENTIAL) . 


CARBOHYD 


365 


365 


N-LINKED (GLCNAC. . .) 


(POTENTIAL) . 



FT C ARB OH YD 421 421 N- LINKED (GLCNAC. . .) (POTENTIAL). 

FT CARBOHYD 489 489 N- LINKED (GLCNAC. . .) (POTENTIAL). 

FT CARBOHYD 772 772 N- LINKED (GLCNAC. . .) (POTENTIAL) - 

SQ SEQUENCE 855 AA; 94654 MW; 4F10E84DA2 146DD5 CRC64; 

Query Match 83.3%; Score 3 901; DB 1; Length 855; 

Best Local Similarity 81.8%; Pred. No. 6.2e-261; 

Matches 699; Conservative 73; Mismatches 83; Indels 0; Gaps 0; 
Qy 1 MGSDRARKGGGGPKDFGAGLKYNSRHEKVTSTGLEEGVEFLPVNNVKKVEKHGPGRWW 60 

Db 1 MGSNRGRKAGGGSQDFGAGLKYNSRLENMNGFEEG 60 

Qy 61 VL IGLLL VLLG IGFL VWHLQYRD VRVQKVFNG YMR I TNENFVDAYENSNS TE F VS LAS KV 12 0 

Db 61 VLFSFLLLSLMAGLLWHFHYRN^ 12 0 

Qy 121 KDALKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERVMAEERWM 18 0 

hlllllh II IIIIIMIIIIIIIIIIIIIIMM III II II = 1 II I I I I 

Db 121 KEALKLLYNEVPVLGPYHKKSAVTAFSEGSVIAYYWSEFSIPPHLAEEVDRAMAVERWT 180 

Qy 181 LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 240 

MM hllllhl Mill I : MMMMM III I = 1 1 1 1 1 1 1 h 1 1 1 1 1 

Db 181 LPPRARALKSFVLTSWAFPIDPRMLQRTQDNSCSFALHAHGAAVTRFTTPGFPNSPYPA 240 

Qy 241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVTNTLSPMEPHALVQLCGTYPPS 300 

Db 241 HARCQWVLRGDADSVLSLTFRSFDVAPCDEHGSDLVTVYDSLSPMEPHAWRLCGTFSPS 300 

Qy 301 YNLTFHSSQNVLLITLITNTERRHPGFEATFFQLPRMSSCGGRLRKAQGTFNSPYYPGHY 3 60 

Db 3 01 YNLTFLSSQWF 3 60 

Qy 3 61 PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 420 

Db 3 61 PPNINCTWNIKVPNNRNVKVRFKLFYLVDPNVPVGSCTKDYVEINGEKYCGERSQFWSS 42 0 

Qy 421 NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 4 80 

ihim i ii i ii iiiiiiiiiiiiihiii ii i m 1 1 ii 1 1 Mini i iii h 

Db 421 NSSKITVHFHSDHSYTDTGFLAEYLSYDSNDPCPGMFMCKTGRCIRKELRCDGWADCPDY 480 

Qy 481 SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 540 

III I hi IMIIMMMIMM MIMI MMMI M MMM Ml 

Db 4 81 SDERYCRCNATHQFTCKNQFCKPLFWVCDSVNDCGDGSDEEGCSCPAGSFKCSNGKCLPQ 54 0 

Qy 541 SQQCNGKDDCGDGSDEASCPKVNVVTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 

IhllllhlllllMIII IMMMMMM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 lllllllll 

Db 541 SQKCNGKDNCGDGSDEASCDSVNWSCTKYTYRCQNGLCLSKGNPECDGKTDCSDGSDEK 600 

Qy 601 DCDCGLRSFTRQARWGGTDADEGEWPWQVSLHALGQGHICGASLISPNWLVSAAHCYID 660 

Ml MMMMMMMM MMM MIMI M MMM M MMMMM I 

Db 601 NCDCGLRSFTKQARWGGTNADEGEWPWQVSLHALGQGHLCGASLISPDWLVSAAHCFQD 660 

Qy 661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 720 

M hill I MMM MMMI 1 1 II Mill M I II I II 1 1 1 1 1 1 1 1 1 1 1 

Db 661 DKNFKYSDYTMWTAFLGLLDQSKRSASGVQELKLKRIITHPSFNDFTFDYDIALLELEKS 72 0 



Qy 721 AE YS SMVRPI CLPDASHVFPAGKAI WVTGWGHTQYGGTGAL I LQKGE IRVINQTTCENLL 780 

Ml - III II I , hi I I I i ll'l I I IIIM h II II MIMI IMIMMMM 
Db 721 VEYSTWRPICLPDATHVFPAGKAIWVTGWGHTKEGGTGALILQKGEIRVINQTTCEDLM 780 

Qy 7 81 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVYT 840 

llllll lllllllll IIIIIIMI llli I 1 1 1 = 1 1 1 1 1 1 1 1 1 = 1 1 1 1 1 1 1 1 1 1 1 1 

Db 781 PQQITPRMMCVGFLSGGVDSCQGDSGGPLSSAEKDGRMFQAGWSWGEGCAQRNKPGVYT 84 0 

Qy 841 RLPLFRDWIKENTGV 855 

III: lllllhlll 
Db 841 RLPWRDWIKEHTGV 855 



RESULT 1 
ST14_HUMAN 

ID ST14_HUMAN STANDARD; PRT; 855 AA. 

AC Q9Y5Y6; Q9BS01; Q9H3S0; Q9HB36; Q9HCA3 ; 

DT 16-OCT-2001 (Rel . 40, Created) 

DT 16-OCT-2001 (Rel. 40, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Suppressor of tumorigenicity 14 (EC 3.4.21.-) (Matriptase) (Membrane- 

DE type serine protease 1) (MT-SP1) (Prostamin) (Serine protease TADG-15) 

DE (Tumor associated differentially-expressed gene-15 protein) . 

GN ST14 OR PRSS14 OR SNC19 OR TADG15 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCB I_TaxID= 9 606; 

RN [1] 

RP SEQUENCE FROM N . A. 

RX MEDLINE=99303581; PubMed=103 73424 ; 

RA Lin C.Y., Anders J., Johnson M. , Sang Q.A., Dickson R.B.; 

RT "Molecular cloning of cDNA for matriptase, a matrix-degrading serine 

RT protease with trypsin-like activity."; 

RL J. Biol. Chem. 274:18231-18236(1999). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=99432178 ; PubMed=10500 122 ; 

RA Takeuchi T. , Shuman M.A. , Craik C.S.; 

RT "Reverse biochemistry: Use of macromolecular protease inhibitors to 

RT dissect complex biological processes and identify a membrane-type 

RT serine protease in epithelial cancer and normal tissue."; 

RL Proc. Natl. Acad. Sci . U.S.A. 96:11054-11061(1999). 

RN [3 ] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Prostate ; 

RA Yamaguchi N. , Mitsui S.; 

RT "Molecular cloning of a novel transmembrane serine protease expressed 

RT in human prostate . " ; 

RL Submitted (JUL-1999) to the EMBL/ GenBank/DDB J databases. 

RN [4] 

RP SEQUENCE FROM N.A. 

RA Tanimoto H. , Underwood L.J., Wang Y. , Shigemasa K. , Parmley T.H. , 

RA O'Brien T. J. ; 

RL Submitted (APR-1998) to the EMBL/ GenBank/DDB J databases. 

RN [5] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Blood, and Muscle; 

RX MEDLINE=22388257; PubMed=12477932 ; 

RA Strausberg R.L., Feingold E.A., Grouse L.H. , Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L., Shenmen CM., Schuler G.D., 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H. , Moore T. , Max S.I., Wang J., Hsieh F. , 

RA Diatchenko L . , Marusina K. , Farmer A. A. , Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Browns tein M.J., Usdin T.B., Toshiyuki S., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A. , Peters G . J. , Abramson R.D., Mullahy S.J., 

RA Bosak S.A., McEwan P.J., McKernan K.J., Malek J. A., Gunaratne P.H. , 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L. J. , Hulyk S.W., 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A. , 



RA Fahey J., Helton E . , Ketteman M., Madan A. , Rodrigues S., Sanchez A., 

RA Whiting M. , Madan A., Young A.C., Shevchenko Y., Bouffard G.G., 

RA Blakesley R.W., Touchman J.W., Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J . , Myers R.M., 

RA Butterfield Y.S.N. , Krzywinski M.I., Skalska U. , Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A. ; 

RT "Generation and initial analysis of more than 15,000 full-length 

RT human and mouse cDNA sequences . " ; 

RL Proc. Natl. Acad. Sci . U.S.A. 99:16899-16903(2002). 

RN [6] 

RP SEQUENCE OF 34 0-664 FROM N. A. 

RA Cao J., Fan W., Zheng S.; 

RT "Genomic analysis of a novel human serine protease SNC19 . " ; 

RL Submitted (JUN-2000) to the EMBL/ GenBank/DDBJ databases. 

RN [7] 

RP CHARACTERIZATION. 

RC TISSUE=Milk; 

RX MEDLINE=99303582 ; PubMed=103 73425 ; 

RA Lin C.Y., Anders J., Johnson M., Dickson R.B.; 

RT "Purification and characterization of a complex containing matriptase 

RT and a Kunitz-type serine protease inhibitor from human milk."; 

RL J. Biol. Chem. 274:18237-18242(1999). 

CC -!- FUNCTION: Degrades extracellular matrix. Proposed to play a role 
CC in breast cancer invasion and metastasis. Exhibits trypsin-like 

CC activity as defined by cleavage of synthetic substrates with Arg 

CC or Lys as the PI site. 

CC -!- SUBCELLULAR LOCATION: Type II membrane protein (Probable). 

CC -!- SIMILARITY: Belongs to peptidase family SI. 

CC -!- SIMILARITY: Contains 2 CUB domains. 

CC -!- SIMILARITY: Contains 4 LDL-receptor class A domains. 

cc 7" 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 
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CC 

DR EMBL; AF118224; AAD42765.2; -. 

DR EMBL; AF133086; AAF00109.1; -. 

DR EMBL; AB030036; BAB20376.1; -. 

DR EMBL; AF057145; AAG15395.1; 

DR EMBL; BC005826; AAH05826.1; -. 

DR EMBL; BC030532; AAH30532.1; -. 

DR EMBL; AF283256; AAG13949.1; -. 

DR HSSP; P00763; 1DP0 . 

DR Genew; HGNC : 11344; ST14 . 

DR MIM; 606797; -. 

DR MEROPS; S01. 3 02; -. 

DR GO; GO: 0005887; C : integral to plasma membrane; TAS . 

DR GO; GO: 0008236; F : serine- type peptidase activity; TAS. 

DR GO; GO:0006508; P : proteolysis and peptidolysis ; TAS. 

DR InterPro; IPR0 00859; CUB. 

DR InterPro; IPR009003; Cys_Ser_trypsin . 

DR InterPro; IPR002172; LDL_receptor_A. 

DR InterPro; IPR001254; Peptidase_Sl . 



DR 


InterPro; 


IPR001314; Peptidase_SlA. 


DR 


Pfam; PF00431; CUB; 2. 






DR 


Pfam; PF00057; ldl_recept_ 




4 . 


DR 


Pfam; PF00089; trypsin; 1. 






DR 


PRINTS; PRO 072 2; 


CHYMOTRYPS IN . 


DR 


PRINTS; PRO 02 61; 


LDLRECEPTOR . 




DR 


SMART; SM00042; CUB; 2. 






DR 


SMART; SM00192; LDLa ; 3. 






DR 


SMART; SM00020; Tryp_SPc; 


1 . 




DR 


PROSITE; 


PS01180, 


; CUB; 2. 






DR 


PROSITE ; 


PS01209, 


; LDLRA_1; 


2 . 




DR 


PROSITE; 


PS50068, 


; LDLRA_2; 


4 . 




DR 


PROSITE; 


PS50240, 


; TRYPSIN^ 


DOM ; 1 . 


DR 


PROSITE; 


PS00134 , 


; TRYPSIN^ 


HIS; 1. 


DR 


PROSITE; 


PS00135, 


; TRYPSIN_ 


SER; 1. 


KW 


Signal -anchor ; Glycoprotein ; 


Hydrolase; Serine protease; 


KW 


Transmembrane; Repeat. 
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CYTOPLASMIC (POTENTIAL) . 
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SIGNAL-ANCHOR (TYPE- II MEMBRANE PROTEIN) 
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(POTENTIAL) . 
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EXTRACELLULAR (POTENTIAL) . 
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CUB 1. 
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CUB 2. 
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LDL- RECEPTOR CLASS A 1. 
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LDL -RECEPTOR CLASS A 2 . 
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LDL -RECEPTOR CLASS A 3 . 
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LDL -RECEPTOR CLASS A 4 . 
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615 


854 




SERINE PROTEASE. 
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CHARGE RELAY SYSTEM (BY SIMILARITY) . 
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CHARGE RELAY SYSTEM (BY SIMILARITY) . 
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CHARGE RELAY SYSTEM (BY SIMILARITY) . 
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N-LINKED (GLCNAC. . .) (POTENTIAL). 
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CARBOHYD 


302 


302 




N-LINKED (GLCNAC. . .) (POTENTIAL). 
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485 


485 




N-LINKED (GLCNAC. . . ) (POTENTIAL). 
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CARBOHYD 


772 
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N-LINKED (GLCNAC. . .) (POTENTIAL). 
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FEA -> GTR (IN REF . 5; AAH05 826) . 
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CONFLICT 


381 


381 




R - > S (IN REF. 4) . 
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A -> V (IN REF. 3) . 


SQ 


SEQUENCE 


855 AA; 94769 


MW; 


26143132C01F99C9 CRC64 ; 



Query Match 99.9%; Score 4676; DB 1; Length 855; 

Best Local Similarity 99.9%; Pred. No. 2.7e-314; 

Matches 854; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 
Qy 1 MGSDRARKGGGGPKDFGAGLKYNSRHEKVNGLEEGVEFLPWNVKKVEKHGPGRWVVLAA 60 



Db 1 MGSDRARKGGGGPKDFGAGLKYNSRHEKWGLEEGVEFLPWNVKKVEKHGPGRWVVLAA 60 

Qy 61 VL I GLLLVLLGI GFL VWHLQ YRDVRVQKVFNGYMR I TNENF VD AYENSNS TEFVS LAS KV 12 0 

lllllllllllilllllllllllllllllMIIIIMII IIIIIIIIIIIIMIIIMI 

Db 61 VL I GLLLVLLGI GFL VWHLQ YRDVRVQKVFNGYMR I TNENF VD AYENSNS TEFVS LAS KV 12 0 

Qy 121 KDALKXjLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERWIAEERVVM 180 

I lllllllllllll IIIMMIIIIIIII I! IIIMI IIIIIIIII.INI 

Db 121 KDALKLLYSGVPFLGPYHKESAVTAFSEGSVIAYYWSEFSIPQHLVEEAERW[AEERVVM 180 

Qy 181 LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 240 



181 LPPRARSLKSFWTSWAFPTDSKTVQRTQDNSCSFGLHARGVELMRFTTPGFPDSPYPA 240 
241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 3 00 

M MIMIMI MMMMMMMMMIMIMMMIMM MUM II M 

241 HARCQWALRGDADSVLSLTFRSFDLASCDERGSDLVTVYNTLSPMEPHALVQLCGTYPPS 300 

3 01 YNLTFHS SQNVLL ITL ITNTERRHPGFEATFFQLPRMS SCGGRLRKAQGTFNSP YYPGHY 360 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

301 YNLTFHS SQNVLL I TL I TNTERRHPGFEATFFQLPRMS SCGGRLRKAQGTFNS P YYPGHY 360 
361 PPNIDCTWNIEVPNNQHVKVSFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 420 

IMMMIMM Mill MMI II MIIIIIIIMMIII II IMMMI! 

361 PPNIDCTWNIEVPNNQHVKVRFKFFYLLEPGVPAGTCPKDYVEINGEKYCGERSQFWTS 420 
421 NSNKITVRFHSDQSYTDTGFLAEYLSYDSSDPCPGQFTCRTGRCIRKELRCDGWADCTDH 4 80 

421 NSNKITVRFH^ 4 80 

4 81 SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 540 

MIIMIIIIIIIIIIIIIMIII IIIIIIIMIMIIIIII! jIMIillllilllll 

481 SDELNCSCDAGHQFTCKNKFCKPLFWVCDSVNDCGDNSDEQGCSCPAQTFRCSNGKCLSK 54 0 
541 SQQCNGKDDCGDGSDEASCPKVNWTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 



541 SQQCNGKDDCGDGSDEASCPKVNWTCTKHTYRCLNGLCLSKGNPECDGKEDCSDGSDEK 600 
601 DCDCGLRSFTRQARVVGGTDADEGEWPWQVSLHALGQGHICGASLISPNWLVSAAHCYID 660 



601 DCDCGLRS FTjRQARWGGTDADEGEWPWQVSLHALGQGHI CGAS LIS PNWLVSAAHC YID 660 
661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 72 0 



661 DRGFRYSDPTQWTAFLGLHDQSQRSAPGVQERRLKRIISHPFFNDFTFDYDIALLELEKP 72 0 
721 AEYSSMVRPICLPDASHVFPAGKAIWVTGWGHTQYGGTGALILQKGEIRVINQTTCENLL 780 



721 AEYSSMVRPICLPDASHVFPAGKAIWVTGWGHTQYGGTGALILQKGEIRVINQTTCENLL 7 80 
781 PQQITPRMMGVGFLSGGVDS^CQGDSGGPLSjSVEADGRIFQAGVVSWGDGGAQRNKPGVYT 840 
781 PQQITPRMMGVGFLSGGVDS^QGd{sGGPLSSVEADGRIFQAGWSWGDGCAQRNKPGVY 84 0 
841 RLPLFRDWIKENTGV 855 

1 1 1 E I M 1 1 1 1 1 1 1 1 

841 RLPLFRDWIKENTGV 855 



